== LiteComment ==
  {{Author:       Proger_XP}}
  {{DateWritten:  7 Feb 2011}}
  {{Tags:         en, large, PHP, markup}}
  {{Category:     -1}}

  {{Nutshell:     "A lightweight text formatter focused on extreme simplicity and speed."}}

  {{Meta image=logo.png}}
  {{NewStyle      lc=litecomment}}

  {{TOC}}

  Nearly a year ago I was in need of an easy-to-use and simple formatting script that
will make user-input texts look neat without too much processing overhead and without
need to learn any markup language beforehand. That project was
((http://ibsearch.i-forge.net/ The Imageboard Search Engine ^^v2^^)).
  That's then I came up with this script - and I called it **LiteComment**.

  After long time after its creation I learned about **((WP:Markdown))**'s markup and was very
surprised to discover that its formatting was extremely similar to that of **LiteCOmmment**.
However, it's still slightly different.

  //If you need a full-fledged formatting framework for serious processing take a look at **((http://uverse.i-forge.net/wiki UverseWiki))** (**a modelling text processor**) which among other places powers up this blog and the ((http://i-tools.org **i-Tools.org** project)).//

  **((litecomment.rar Download **LiteComment** PHP script)).** You can check the
**LiteComment's sandbox** ((litecomment_test.php here)).

== Features ==
  * Extremely fast processing - using just **one regular expression** - a single call
    to ((php:preg-replace-callback preg_replace_callback)) per document.
    * **100 KiB** of moderately formatted English text is rendered into HTML within **~0.2 sec**.
    * The whole module is under 550 lines of PHP code (60 lines of which is taken by
      the regular expression in %%/x%% mode), including comments and blank lines.
  * Intuitive and intentionally limited markup.
  * **URLs never clutter output** - up to 2 words before an URL is used for its caption
    or, if there are none, only domain name is shown instead of full URL string.
    * Examples: %%link here www.google.com%% => %%(lc) link here www.google.com%%;
                %%www.google.com%% => %%(lc) www.google.com%%.
  * Some essential **((#typographics))** (dashes, ellipsis and such).
  * Ability to prevent formatting of strings by placing them into %%(lc)`code blocks`%%.
  * Several options of **automatic ((#e-mail e-mail masking))** (including **JavaScript**, not or both):
    %%(lc antispam=js)my@email.org%%.
  * **((#Extensibility))** (albeit limited) and **special inset syntax** for ((#custom)) commands:
    %%[- arg | name=value | arg | ... -]%%. By default it creates an explicit link.
  * **LiteComment** can optionally process **nested statements** (for chosen formatting elements)
    so when it's turned off %%>*bold*%% will be an inline quote but its content will be
    output in plain without bolding.
  * Full **Unicode (UTF-8)** support.
  * **Valid XHTML** with support for both semantic and %%<span>%%ish markup ([[#configuration customizable]]).
  * **CSS-ready**; each and every element produced by the formatter has a set of CSS classes.

== Formatting ==
%%(mirror; lc minHeadingLevel=1)
  == Heading of level 1 ==
  === Heading of level 2 ===
  ... up till:
  ======= The smalles heading of level 6 =======

  You can specify any number of "="s on
  the right or omit them at once:
  ======= Like this ==
  ======= ...this =
  ======= ...or this!

  *bold text*, `monospaced` (code)

  `
  Multiline code
  (unformatted text)
  `

  "
  Multiline quotation
  (blockquote)
  "

  >>> inline quote
  >> more recent saying
  >most recent (spaces after > are optional)

  Citations: "Long time ago when the world was ruled by Nevermore..."
  Or using single apostrophes: 'Long time ago'...

  Links - http://google.com   Or this www.embarcadero.com
  Up to 2 preceding words are used for the caption: www.ya.ru
  [- http://google.com | Caption of the link -]
  E-mail masking: e@mail.em.com

  Pictures: icon.png, [- logo.png | Image-link -]
  With a thumbnail: [- logo.png | icon.png -]

  Rules - 4 types (CSS classes); they are lines
  consisting of 3 or more identical symbols:
  ---
  ~~~~
  =====
  ++++++
%%

=== Typographics ===
%%(mirror; lc)
  Dashes:       short (en) - dash, long (em) -- dash, the ---- same
  Ellipsis:     2.. (one removed) or more... dots......
  Symbols:      (c) (r) (p) (tm)
  Numbers:      #1  #33  1st  55th  #100th
  Matrix:       10*10  10x10  10X10  10 x 10
  Division:     10/10  10:10  10 / 10 (spaces are fine)
  Plus-minus:   +-  +-5  +- 5

  Arrows (number of "=" doesn't matter if it's >= 2):
  <=>  <=  =>  <====  ====>  <=====>

  Flood protection against repeated chars:
  OMG!!!  WTF?!?  Erm????
%%


== Usage ==
  The simplest way of formatting your text is to call static %%Format%% method:
%%(php)
  require 'litecomment.php';
  $str = htmlspecialchars('*bold* <link>');
  echo LiteComment::Format($str);   // => <strong>bold</strong> &lt;link&gt;
%%
  //%%(php)$str%% could be an array (see the [[#methods method's description]]).//

  It leaves you no chances to customize things but it's simple. Well, in fact there's
not much room for tweaking class instance - most settings are static. Moreover, there
shouldn't be many reasons to construct the class and then operate on it because it's
constructed once per document and **should not** be used thereafter.

  //**Note:** before calling format methods **you need to escape HTML on your own**.// You'd
normally do this with ((php:htmlspecialchars)) like in the example above.

=== Methods ===
.(table_dfn hide_this_paragraph)
  = %%(php) static Format($text)%% == Returns a formatted HTML;
    %%(php) $text%% can be **a string or an array of strings** (just like [[php:preg-replace reg_replace]]) -
    array will have all items formatted.
  = %%(php) SetAntiSpamMode($mode)%% == Sets ((#e-mail e-mail masking mode)); by default
    can be one of the following:
    = obfuscate (**default**) == Puts %%<a>%% with obfuscated email (e.g. "mail~@~at.here~com").
    = asis Puts %%<a>%% with plain e-mail address, welcoming the spammers.
    = js Puts obfuscated %%<a>%% for non-scripting users and %%<a>%% with
      plain address using **JavaScript**. Usually the best way but it nearly doubles the size
      of each e-mail link in the source; that's why %%obfuscated%% is the default mode.
    = jsonly Puts plain %%<a>%% by means of **JavaScript** alone; e-mails are hidden from
      non-scripting users.
  = %%(php) SetFileExtensions($extstr)%% == Sets list of extensions that will be
    recognized in source document and parsed as links. Must be a **regular expression**
    prepared to be inserted into brackets, e.g.: %%($extstr)%% -> %%(bmp|png|zip)%%.
    **Default value:** %%w?bmp|xbm|gif|jpe?g|png|svg%%.

=== Properties ===
  As most properties are static and represent various settings see the ((#Configuration))
section for their description.


== Configuration ==
  .(table_dfn)
  The following **static properties** of %%LiteComment%% class can be of interest:
  = maxQuoteLevel Inline quotes are defined using a number of %%>%% symbols at line
    start: %%>>hello!%%. This property sets when wrap around line quote level: by default
    it's 4 so %%>quote%% is of level 1, %%>>>>quote%% is of level 4 and %%>>>>>quote%%
    is again of level 1.
    * this only affects CSS class name.
  = minHeadingLevel == Sets min heading level (for %%<h#>%% tag); by default it's
    3 meaning that %%==heading==%% produces %%<h3>heading</h3>%%, %%=====heading==%%
    produces %%<h6>%% and everything above, e.g. %%======heading==%% also produce
    %%<h6>%% tag.
  = maxHeadingLevel See %%minHeadingLevel%% above.
  = tagAliases Controls tag names used, is like an alias table; by default they're set
    to match semantic markup guidelines (e.g. %%<em>%% for italic) but you can change
    them to %%<div>%%s and %%<span>%%s to (or anything else) to rely on
    CSS classes alone (which are attached to each tag regardless of its name).
  = fileExtensions This property can't be changed directly as it affects the regular
    expression; use ((#methods %%SetFileExtensions%%)) method to set it.
  = jsAntiSpamFuncName Sets the name of **JavaScript** function that will be called to
    output ((#emails masked e-mail address)) (only for masking methods like %%js%% and %%jsonly%%).
    That function will be passed 3 arguments: %%'$account', '$domain', '$zone'%%
    (e.g. %%'my', 'e.mail', 'net')%% for "my@e.mail.net").
    * when set to empty default implementation will be used that simply outputs an
      e-mail link built from given components: %%(lc antispam=js) my@e.mail.net%%.
  = antiSpamJSDefinition Definition for default **JavaScript** masking function inserted into resulting
    HTML if %%jsAntiSpamFuncName%% is unset.
  = leftWordBoundaryChars Defines the list of symbols which are considered "left word boundary".
    May contain regular expression-specific characters (they're quoted). **For example**,
    %%"*bold*"%% is a quote but will only be formatted as bold if %%"%% is listed both in
    left and right word boundary lists.
  = rightWordBoundaryChars Defines the list of symbols which are considered "right word boundary".

  .(table_dfn)
  The following **instance properties** of %%LiteComment%% class can be of interest:
  = keepFormatChar If %%(php)true%%, ((#simple)) format characters are kept: %%*bold*%% =>
    %%(lc keepFormatChar=1)*bold*%%. If %%(php)false%% they're removed: %%(lc keepFormatChar=0)*bold*%%.
  = recursive Can be an **array**, %%(php)true%% (equals to all array elements set to true)
    or %%(php)false%%. If %%(php)false%% - nested formatting isn't processed - extremely
    quick (1 regexp ran per 1 source) but %%"a *b* c"%% will result in %%(lc recursive=0)"a *b* c"%% while if this is %%(php)true%% it will be %%(lc recursive=1)"a *b* c"%%.
  = externalLinkAttributes A string that is added to %%<a>%% tags that point to external
    resources. **By default** it's %%target="_blank"%% but you can add more and make it, for
    example, ---%%target="_blank" rel="nofollow"%%.


== Extensibility ==
  Although **LiteComment** is intentionally limited in features it nevertheless has
a few tricks in its pockets (and it has like 65 //pockets//) that allow you to extend
the markup in different ways.

  Most bruteforce approach is, obviously, changing the regexp **LiteComment** uses but
you'll also need to shift all pocket offsets after inserting new capturing brackets.
That's been made easier by thanks of %%(php) LiteComment->MatchesFrom()%% but still requires
some effort.
  And, besides, I've preserved a few "extensible air holes" just for this occasion.

  **((litecomment.rar Download **LiteComment** PHP script)).** You can check the
**LiteComment's sandbox** ((litecomment_test.php here)).

=== Custom commands via //special insets// ===
  This is most common and flexible way to add your own markup or features. It uses
what I call "//special insets//". Its syntax is:
  %%[- arg | arg | name=value | ... -]%%

  //If an argument doesn't have a name (no %%=...%% part) it'll be assigned an index.//
Spaces after %%[-%%, before %%-]%% and around %%|%% are optional.

  **How to add your own handler?** Two methods of %%LiteComment%% class deal
with special insets:
  = %%(php) FormatSpecialInsetInHTML(&$contents)%% == The main routine; %%(php) $contents%%
    is what was found inside %%[-...-]%%.
  = %%(php) SpecialStrToSettings(&$str)%% == Gets called by the above method; splits string
    into parameters separated by pipes (%%|%%), assigning names to them (%%name=value%%).

  So to add your command you need to go to %%(php)FormatSpecialInsetInHTML%% and examine its code:
%%(php)
  $settings = $this->SpecialStrToSettings($contents);

  $urlKey = array_shift(array_keys($settings));
  $url = array_shift($settings);
  if (!is_int($urlKey)) {
    $url = "$urlKey=$url";  // URL contains '=', reinsert it.
  }

  $title = &$settings['title'];
  $title or $title = &$settings[0];
  return $this->MakeHTMLLink($url, $title);
%%

  Now your actions depend on how you want to extend the //special inset//.
  1. The first line splits string into command and arguments for it. If you want to
     **override the default syntax** of //special inset// (%%[- cmd | arg=value | arg | ...-]%%)
     //you would want to insert your custom code right before the first line//.
  2. The rest of the method is the **default handler which inserts links**
     ([[#formatting syntax]]: %%[- URL | caption-]%% or %%[- URL | title=caption -]%%).
     So the next block takes first argument off the argument list (it considers it an URL).
     //If you want to have a command like %%[- my_command | arg | ... -]%% you need to add your code after this block - %%(php) $url%% will contain the name of command (which is the first argument).//
  3. The last block **searches for %%title%% argument** taking the argument of index #0
     if none found and finally inserts a link. //You probably won't need to change anything here.//

==== Embedding YouTube videos ====
  Let's say we want to be able to embed YouTUbe videos from **LiteComment** formatting.
The syntax can really be anything within %%[-...-]%% construction ([[#custom //spcial inset//]])
so I've chosen this one: %%[- youtube | myCyJJdhhDk -]%%.

  First, let's locate ((#custom %%FormatSpecialInsetInHTML%%)) method and make changes
there. Having the code of this function before our eyes (in previous section) we'll add
our code after the 2nd block:
%%(php)
  if (strtolower($url) === 'youtube') {
    $id = $settings[0];
    $html = '<iframe title="YouTube video player" width="480" height="390"
                     src="http://www.youtube.com/embed/'.$id.'?rel=0"
                     frameborder="0" allowfullscreen></iframe>';
    return $html;
  }

  // original code follows:
  $title = &$settings['title'];
  ...
%%

  **That's all!** Now we can use this syntax to insert a //YouTube video//:
%%
  Here's a tutorial on getting started in well-known ModPlug Tracker:
  [- youtube | myCyJJdhhDk -]
%%

===== Bonus =====
  To demonstrate other possibilities of extending [[#custom //special insets//]] let's
say that I also want to support this, shorter, syntax: %%[- youtube myCyJJdhhDk -]%%.
Here the difference is that video's ID ("myCyJJdhhDk") is no more an argument so we need
to do parsing on our own.
  That's not difficult at all, though - simply insert the following code right in the
beginning of the %%(php)FormatSpecialInsetInHTML%% function:
%%(php)
  if (stripos($contents, 'youtube ') === 0) {
    $id = substr($contents, strlen('youtube '));
    // the rest is the same as in the previous example with [-youtube | ID-]:
    $html = '<iframe title="YouTube video player" width="480" height="390"
                     src="http://www.youtube.com/embed/'.$id.'?rel=0"
                     frameborder="0" allowfullscreen></iframe>';
    return $html;
  }

  // original code follows:
  $settings = $this->SpecialStrToSettings($contents);
  ...
%%

  Possibilities for extending //special insets// are quite endless.


=== Representing URLs as something else than text ===
  By default when you format an URL (like %%http://google.com%% or %%[- goo.com -]%%)
you'll get a link with text caption. However, this is boring and sometimes we want
to **display a thumbnail** pointing to the full image - text caption just isn't good enough.
  And that's what you can do - **make plain URLs look different**.

  By default **LiteComment** already includes handlers that will show URLs with picture
extensions (such as %%.png%%) as images (%%<img />%%) so that when you write:
%%picture.jpg%% or %%[-http://my-home | thumb.png-]%% you'll see an image linking to
the actual page.

  You can extend this by adding your own handlers (//handlers are triggered based on file (URL) extension//).
  Adding a custom handler is as simple as adding a method to %%LiteComment%%
class with the name of %%(php)HTMLByExt_<EXTENSION>%%. For example, if we want to provide a
download counter for archives and also a neat icon of them before the link we can add
this method for %%ZIP%% archives:
%%(php)
  function HTMLByExt_ZIP($url) {
    $counts = unserialize( file_get_contents('dl-counter.txt') );
    if (!is_array($counts)) { $counts = array(); }
    $thisCount = $counts[$url]++;
    file_put_contents('dl-counter.txt', serialize($counts), LOCK_EX);

    return '<img src="images/zip-link.png" />'.basename($url).
           " ($thisCount downloads so far)";
  }
%%
  Remember that link (%%<a>%%) will be added automatically. Also, since this
method is called when formatting texts **the counter won't update if you're caching**
formatted HTML (not that it's necessary with the kind of speed **LiteComment** has).

  Now the following snippet will be neatly formatted:
%%
  In this archive: litecomment.zip you'll find the software with all necessary instructions.
%%

  **Note that LiteComment will only recognize extensions in-text that were registered using ((#methods %%(php) SetFileExtensions()%%)).**

==== Is it really by extension?.. ====
  Erm, well, when I said that handlers are triggered based on file extension I tricked
you a little :) They're actually triggered based on the whole URL and "extension trigger"
is just the simplest method of adding a new handler.
  What do I mean?

  //Let's say we want to warn users about links to a particular site.// Such URLs don't
have to be of one extension - just linking to the same resource. Say, %%spammer.org%%.
  We'll start off with locating %%(php)GetHTMLForURL%% method and examining its code:
%%(php)
  if ($methodName = $this->GetHTMLFileMethodFor($url)) {
    return $this->$methodName($url);
  }
%%
  Pretty straightforward, eh? This function accepts %%(php) $url%% argument which
holds the entire URL that was passed to //[[#custom special inset]]// (%%[-url|caption-]%%) or
was found in the text (like %%www.site.ru/file%%). Now we know what to do:
**add an extra condition to the beginning of this method**:
%%(php)
  if (stripos($url, 'spammer.org') !== false) {
    return '<em>This site might harm your system.</em>';
  } elseif ($methodName = $this->GetHTMLFileMethodFor($url)) {
    // original code follows.
%%
  Now each link to %%spammer.org%% will have that text included: "Visit spammer.org"
-> %%VIsit <a href="..."><em>This site might harm your system.</em></a>%%.


=== ((#simple)) New simple format (e.g. italic) ===
  A //"simple format"// is a text placed between 2 identical strings. For example, default
simple formats are bold (%%*bold*%%) and preformatted (%%`code`%%) texts. As you can see
they are created by %%*%% and %%`%% symbols correspondingly. Note that a simple format doesn't
have to use a single symbol - it can be a string (e.g. %%##%%).
  You can easily add a new simple format if it follows this rule.

  Let's say we want to underline text. It'll have this syntax: %%_underlined_%% and it will
be using %%<ins>%% tag.

//**Side note:** %%<u>%% isn't HTML5-compliant while %%<ins>%% is displayed underlined in all browsers as far as I have tested. Similar story with %%<del>%% and %%<s>, <strike>%% tags - the first is semantic and is displayed striked-through by default).//

  We need to do 3 things:
  1. **Add format symbol to the regexp**: find a place in the code that looks like this:
     ---%%([\*\`]) ((?=[^\s]) .+? (?<=[^\s])) \30         # formatting char (eg. *)%%---
     Let's add %%_%% there: %%([\*\`_]) ...%% (the rest of line is the same).
  2. Add the **tag alias** to %%(php) static $tagAliases%% property. Example:
     ---%%(php) static $tagAliases = array('code' => 'code', 'strong' => 'strong', ...%%
     ---Let's change this line to the following:---
     %%(php) static $tagAliases = array('ins' => 'ins', 'code' => 'code', ...%%
  3. Finally, add your tag to the **list of simple formats** in %%(php) HTMLReplaceCallback()%% method:
%%(php)
  } elseif ($formatChar = &$matches[30]) {
    static $charToClass = array('*' => array('strong', 'emphasis'), '`' => array('tt', 'monotype'));
    ...
%%
  Simply add an item to %%(php) $charToClass%%:
%%(php)
  static $charToClass = array('_' => array('ins', 'underlined'), ...
%%
  The first item (%%(php) 'ins'%%) is our tag, the second (%%(php) 'underline'%%) is CSS
class to assign to it. It can also be an array to specify several classes - for example:
%%(php)
  array('_' => array('ins', array('underlined', 'inserted')), ...
%%

=== New multiline format ===
  A //"multiline format"//, or a //block//, spans multiple lines (obviously). Default multiline formats
are code and blockquotes:
%%(mirror; lc)
  `
  code goes here
  and isn't processed
  `

  "
  this is a blockquote
  with many lines
  "
%%

  Similarly to ((#simple simple formats)) multiline formats are created by identical
strings (1 or more characters) placed on separate lines.

  Let's say we want to add some "attention box" that will be expressed via %%<div>%%
tag with CSS class set to %%attention%%. It will be created with this markup:
%%
  !!
  Please use the forum search function
  before asking your questions!
  !!
%%

  To implement this we need 3 things - just like with a ((#simple simple format)):
  1. **Add format symbol to the regexp**: find a place in the code that looks like this:
     ---%%\n+ (`|&quot;)\s*?\n ([\s\S]+?) \n+\3\s*? $         # multiline insets; [\s\S] is like . + \n%%---
     Let's add %%!!%% there: %%\n+ (`|&quot;|!!)\s* ...%% (the rest of line is the same).
  2. Add the **tag alias** to %%(php) static $tagAliases%% property. Example:
     ---%%(php) static $tagAliases = array('code' => 'code', 'strong' => 'strong', ...%%---
     By default it already includes the tag we want to use (%%<div>%%) so no need
     to do anything here.
  3. Finally, add your tag to the **list of multiline formats** in %%(php) HTMLReplaceCallback()%% method:
     %%(php) static $multilineToTag = array('`' => array('code', 'code'),%%
     We need change this line to:
     %%(php) ...array('`' => array('code', 'code'), '!!' => array('div', 'attention'), ...%%

  **That's it!**

=== E-mail masking methods ===
  **LiteComment** is able to replace plain e-mail addresses in texts with obfuscated
alternatives. By default it has 3 spam-protection methods (obfuscating, **JavaScript**
protection and **JavaScript** protection with obfuscating for those who have **JavaScript** turned off)
but you can always add more. How?

  Quite easily:
  1. Add a record to ((#methods %%(php) SetAntiSpamMode()%%));
  2. Add a method named %%MakeEmail_<NAME>%% to the %%LiteComment%% class.

  Let's say we want to display e-mails as images. For this let's go to %%(php) SetAntiSpamMode()%%
function and add a new %%(php) case 'our_method'%% there. Example:
%%(php)
  ...
  case 'jsonly':
  case 'image':   // <- added
    $this->antiSpamMethod = "MakeEmail_$mode";
  }
%%

  Now let's create %%(php) MakeEmail_image%% method (it needs ((http://php.net/manual/en/book.image.php **GD library**)) for PHP):
%%(php)
  function MakeEmail_image($account, $domain, $zone) {
    $img = imagecreatetruecolor(100, 30);
    $color = imagecolorallocate($img, 0, 0, 0);
    imagestring($img, 3, 0, 0, "$account@$domain.$zone", $color);

    ob_start();
    imagepng($img);
    $data = ob_get_clean();

    $base64 = chunk_split(base64_encode($data));
    $html = '<img src="data:image/png;base64,'.$base64.'" alt="E-mail" />';
    return $html;
  }
%%
  //The code working with **GD** is purely for demonstration and has some limitations//
(e.g. we could [[php:imagettfbbox determine]] the width of e-mail string before creating the image). However, I think it demonstrates well how things generally work.

  Download ((litecomment.rar **LiteComment**)). You can check its sandbox ((litecomment_test.php here)).

  //Please **drop a comment below** if you have questions or if you're using **LiteComment** in your project!//